Fastest Convergence for Q-learning
نویسندگان
چکیده
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases. The comparison plot on this first page, taken from Fig. 9 of this paper, is an illustration of the amazing acceleration in convergence using the new algorithm. A secondary goal of this paper is tutorial. The first half of the paper contains a survey on reinforcement learning algorithms, with a focus on minimum variance algorithms.
منابع مشابه
Robust optimal design and convergence properties analysis of iterative learning control approaches
In this paper, we address four major issues in the .eld of iterative learning control (ILC) theory and design. The .rst issue is concerned with ILC design in the presence of system interval uncertainties. Targeting at time-optimal (fastest convergence) and robustness properties concurrently, we formulate the ILC design into a min–max optimization problem and provide a systematic solution for li...
متن کاملIndoor UAV Navigation to a Rayleigh Fading Source Using Q-Learning
Unmanned aerial vehicles (UAVs) can be used to localize victims, deliver first-aid, and maintain wireless connectivity to victims and first responders during search/rescue and public safety scenarios. In this letter, we consider the problem of navigating a UAV to a Rayleigh fading wireless signal source, e.g. the Internet-of-Things (IoT) devices such as smart watches and other wearables owned b...
متن کاملFurther study on $L$-fuzzy Q-convergence structures
In this paper, we discuss the equivalent conditions of pretopological and topological $L$-fuzzy Q-convergence structures and define $T_{0},~T_{1},~T_{2}$-separation axioms in $L$-fuzzy Q-convergence space. {Furthermore, $L$-ordered Q-convergence structure is introduced and its relation with $L$-fuzzy Q-convergence structure is studied in a categorical sense}.
متن کاملLanguage Evolution on a Dynamic Social Network
We study the role of the interaction network in a collaborative learning model known as the classification game. This involves a population of agents collaboratively learning to solve a classification task. We have previously shown that interaction while learning allows the agents to converge upon a simpler solution, which in turns guarantees better generalization. This clearly shows the benefi...
متن کاملStratified $(L,M)$-fuzzy Q-convergence spaces
This paper presents the concepts of $(L,M)$-fuzzy Q-convergence spaces and stratified $(L,M)$-fuzzy Q-convergence spaces. It is shown that the category of stratified $(L,M)$-fuzzy Q-convergence spaces is a bireflective subcategory of the category of $(L,M)$-fuzzy Q-convergence spaces, and the former is a Cartesian-closed topological category. Also, it is proved that the category of stratified $...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.03770 شماره
صفحات -
تاریخ انتشار 2017